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Abstract: Background: Vital signs in our emergency department information system were entered into free-text fields for 
heart rate, respiratory rate, blood pressure, temperature and oxygen saturation. 

Objective: We sought to convert these text entries into a more useful form, for research and QA purposes, upon entry into 
a data warehouse. 

Methods: We derived a series of rules and assigned quality scores to the transformed values, conforming to physiologic 
parameters for vital signs across the age range and spectrum of illness seen in the emergency department. 

Results: Validating these entries revealed that 98% of free-text data had perfect quality scores, conforming to established 
vital sign parameters. Average vital signs varied as expected by age. Degradations in quality scores were most commonly 
attributed logging temperature in Fahrenheit instead of Celsius; vital signs with this error could still be transformed for 
use. Errors occurred more frequently during periods of high triage, though error rates did not correlate with triage volume. 

Conclusions: In developing a method for importing free-text vital sign data from our emergency department information 
system, we now have a data warehouse with a broad array of quality-checked vital signs, permitting analysis and 
correlation with demographics and outcomes. 

Keywords: Data warehouse, electronic health records, emergency medicine, hospital information systems, text mining, user 
computer interface, vital signs. 



INTRODUCTION 

While various methods have been explored to bring structure 
to unstructured health data, or to extract clinically 
meaningful data from unstructured clinical documents [1], 
there is comparatively little reported on the topic of 
converting structured data to a more readily analyzable form. 

In our Emergency Department (ED) information system, 
patient vital signs were transcribed electronically, but in 
separate free-text fields. This practice limits the usefulness 
of the data outside of its immediate, clinical scope. 

Vital signs are an important clinical tool, both for triage 
decisions [2] and for monitoring disease progression and 
therapy efficacy. But vital signs have the potential to aid 
public health goals and research as well. Vital sign data can 
augment syndromic surveillance systems in place in many 
emergency departments [3]. Public health researchers can 
monitor hypertension prevalence in a community through 
collected ED vitals signs data [4,5]. Decision support tools can 
be triggered or adjusted based on vital sign analysis [3,6]. 
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To make use of vital signs for research and quality 
assurance purposes, we embarked upon a project to convert 
vital sign values, recorded as plain text, to numerical format, 
upon their extraction to a data warehouse. 

A 1:1 conversion from text to numeric data, however, 
would result in the inclusion of many erroneous values and 
lead to poor data quality in the warehouse. So we developed 
a process for transforming data exhibiting common errors of 
text entry (such as commas instead of decimal points, or 
temperatures in Celsius instead of Fahrenheit), and assigning 
quality scores to the transformed data based on the steps 
required to deliver physiologic vital signs. 

This effort posed an immediate challenge, as emergency 
department patients include the very young and old, with 
some patients in relatively good health and others in 
extremis. There is likely no other setting in healthcare 
delivery where such variability in vital signs is routinely 
encountered. And while the range of what can be considered 
normal or abnormal for a given age is generally agreed upon 
[7-10], there is little data on the prevalence of abnormal 
vitals, the true range of vitals encountered in the emergency 
department [11], or factors influencing the error rate while 
recording vitals [1 1-14]. 

Thus, to determine the appropriateness of vital sign 
transform rules and determine the quality of imported vital 
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sign data, proposed rules were retrospectively derived from a 
set of emergency department vital signs, and then 
prospectively applied to a wider set. 

We hypothesized that analyzing the free-text entries for 
vital signs would reveal, for the vast majority of cases, 
usable and clinically appropriate information, but that some 
small fraction of data would require transformation (such as 
changing Fahrenheit to Celsius, or deleting an extraneous 
character) to be usable, and some data would be so 
confusingly inappropriate as to not be usable at all. 

Regarding the vital sign data itself, we hypothesized that 
the average vital sign data would conform to expected 
human physiologic parameters, such as heart rate, pulse 
oximetry, and blood pressure changes with respect to age, 
and that daily variations in blood pressure and seasonal 
variations in temperature might be observed. Furthermore, 
we expected this set of vital signs would suggest a particular 
period of the day when errors in entry were most common, 
though it was unclear whether this period of highest errors 
would be associated with the stress that comes with high 
patient census (later afternoons and evenings, in our facility), 
or boredom that comes with low volume (the predawn hours, 
in our ED). 

The wider purpose in reporting our observations here is 
twofold. First, the transformation rules we validated should 
have immediate bearing on the clinical data tasks 
informaticists and information technology (IT) personnel 
find themselves facing today, such as minimizing errors, 
making data available for health information exchange [15], 
demonstrating meaningful use of electronic health records 
[16], or complying with accountable care organization goals 
for blood pressure control in at-risk populations. 

Secondly, in the course of this work, we generated what 
we believe is the first comprehensive data set demonstrating 
the range of emergency department vital signs. Using a data 
warehouse to correlating vitals with comorbidities, 
laboratory results or outcomes data may help identify 
patients at particular risk, and help marshal resources to 
manage at-risk patients more efficiently. 

METHODS 

For the derivation phase of this project, assessments of 
all unique values of Heart Rate (HR), Respiratory Rate (RR), 
Temperature (T), Blood Pressure (BP) and Oxygen 
Saturation (Sp02) were extracted from the ED information 
system (Picis, Wakefield, Massachusetts) over a date range 
of one month (December 2009). In our large urban academic 
emergency department, which includes pediatric patients and 
sees approximately 100,000 visits per year, vitals are 
typically entered by nurses at triage and again by 
technicians, nurses and physicians at regular intervals during 
the patient visit, depending on their complaint and degree of 
illness. 

The vital sign extraction was performed using HL7 
interface and Extract-Transfer-Load (ETL) tools in the 
Mount Sinai Data Warehouse (MSDW). The MSDW 
contains data extracted from twenty of Mount Sinai's clinical 
and financial systems, including all 2.7 million patients in 
the Master Person Index and over one billion facts for these 
patients. Database searches are performed with an internally 
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developed Cohort Query Tool written in Java. The MSDW is 
compliant with HIPAA and New York State privacy and 
security regulations and with institutional policy regarding 
protection of human subjects and participation in research. 

Extract-Transfer-Load (ETL) processes are written in 
Business Objects Data Integrator and Oracle stored 
procedures, and the Rules Engine in the MSDW is an in- 
house developed component of our ETL process. Every 
transaction loaded to MSDW is processed through the Rules 
Engine, and actions are applied according to the nature of the 
transaction and matching rules. These actions are typically a 
data quality score reduction. Rules are formulated by 
extensive profiling of the source data, and in consultation 
with subject matter experts. Typical data quality issues that 
result in quality score deductions include removing special 
characters from free-text data or data that falls outside the 
range experts consider possible. The magnitude of these 
deductions - from 5 points to a full 100 point deduction - are 
based on user's impressions of the meaningfulness of 
imported data. 

Text values for vital signs were analyzed, and rules for 
deducting points from quality scores were developed, as a 
search of the literature, web and our colleagues revealed no 
applicable set of transformation rules. We developed rules 
(see Appendix A) based on commonly accepted ranges for 
human physiology and some of our judgments (NG, KB) as 
emergency physicians, and also as users of the Picis system, 
for what would make a reasonable deduction in score. 

The rules engine was then applied to automatically 
identify, cleanse, and score uploaded vital sign data. This 
validation phase involved applying the derived rules to data 
collected over a period of from January 1, 2010 to July 30, 
2010. An iterative text parsing mechanism was used to 
process the free-text vital sign data, so data type corrections 
could be performed (such as assigning proper units of 
measure). The Rule Engine reduced quality score based on 
the number and types of efforts necessary to process the 
original data. The end result was a cumulative score assigned 
to every vital sign "fact" entered into the data warehouse. 

OBSERVATIONS 

From the derivation cohort of unique values of heart rate, 
respiratory rate, blood pressure (separated here as systolic 
and diastolic values) temperature and pulse oximetry, we 
developed the transform rules (see Appendix A). 

Applying these rules prospectively over a period of 6 
months of collected HL7 messages from the emergency 
department led to the generation of our prospective data set 
of vital signs, which consisted of 497,302 detected vital sign 
fields, from 30,269 distinct medical record numbers 
associated with 41,624 visits. 799 patients had fields 
associated with no usable data (for these cases, typically one 
field would be used to indicate the patient refused or vital 
signs could not be obtained). 

98% of the imported vital signs in the validation set had a 
quality score of 100, with an additional 1.3% of signs 
achieving a score of 95 and 0.3% having a score of 90. 
Scores between 0 and 90 occurred 147 times (0.02%) and 
there were 1079 instances of quality scores of 0 (0.21%). 
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Table 1. Importation Rules and Failure Counts Upon Validation 
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Validation Type 


Failures 


% Total 


<Temperature> consists of Fahrenheit value instead of Celsius value and Quality Score of <5> is decremented 


2396 


40.9% 


<02 Saturation> consists of value outside of 02 Saturation range and Quality Score of <100> is decremented 


896 


15.3% 


<Temperature> consists of values like alphabetical characters, punctuation marks and Quality Score of <5> is decremented 


530 


9.0% 


<Respiration> consists of value outside of Respirations range and Quality Score of <100> is decremented 


354 


6.0% 


<Blood Pressure> consists of value outside of Systolic Blood Pressure range and Quality Score of <100> is decremented 


292 


5.0% 


<Blood Pressure> consists of value outside of Diastolic Blood Pressure range and Quality Score of <100> is decremented 


216 


3.7% 


<Blood Pressure> consists of Diastolic Blood Pressure value greater than Systolic Blood Pressure value and Quality Score of 
<100> is decremented 


204 


3.5% 


<Pulse> consists of value outside of Pulse range and Quality Score of <100> is decremented 


182 


3.1% 


<Temperature> consists of value outside of Celsius range and Quality Score of <100> is decremented 


154 


2.6% 


<02 Saturation> consists of 02 Saturation values in range (X-Y), instead of a single value and Quality Score of <10> is 
decremented 


96 


1.6% 


<Blood Pressure> consists of values like alphabetical characters, punctuation marks (other than V) and Quality Score of <5> is 
decremented 


96 


1.6% 


<Temperature> consists of Fahrenheit value greater than acceptable limit and Quality Score of < 1 0> is decremented 


90 


1.5% 


<Blood Pressure^ contains the value not of the number/number format and Quality Score of <100> is decremented 


90 


1.5% 


<Pulse> consists of pulse values in range (X-Y), instead of a single value and Quality Score of <10> is decremented 


58 


1 .0% 


<Blood Pressure> consists of Systolic Blood Pressure value same as Diastolic Blood Pressure value and Quality Score of <100> is 
decremented 


56 


1.0% 


<Blood Pressure> Blood Pressure contains 0/0 and Quality Score of <100> is decremented 


36 


0.6% 


<Pulse> consists of values like alphabets, punctuation marks etc and Quality Score of <100> is decremented 


34 


0.6% 


<02 Saturation> 02 Saturation Value contains decimal and Quality Score of <5> is decremented 


32 


0.5% 


<Respiration> consists of values like alphabetical characters, punctuation marks and Quality Score of <100> is decremented 


30 


0.5% 


<02 Saturation> consists of values like alphabetical characters, punctuation marks (other than '/') and Quality Score of <5> is 
decremented 


10 


0.2% 


<Respiration> consists of Respiration values in range (X-Y), instead of a single value and Quality Score of <10> is decremented 


8 


0.1% 



The reason most vital signs had quality score points 
deducted was a use of Fahrenheit instead of Celsius for the 
temperature field, and the reason most scores fell below 90 
and hence, outside the range of usable data, was because of 
pulse oximeter values outside physiologic norms or above 
100% (see Table 1). 

Comparatively, deductions in the quality score of heart 
rate, respiratory rate and blood pressure were less common 
that those for temperature and pulse oximetry (see Table 1). 



The average values for respiratory rate, heart rate, blood 
pressure, temperature and pulse oximetry, along with 
standard deviations and range, are shown in Table 2A. 

Heart and respiratory rate were recorded more often than 
blood pressure and pulse oximetry and over twice as often as 
temperature. Rectal temperature was least commonly 
employed but was consistently higher than tympanic 
temperature, perhaps reflecting clinical suspicion of a fever 
before ordering the procedure. 



Table 2A. Statistics on Validated Vital Signs 





Pulse 
(Beats Per Min) 


Respiratory Rate 
(Breaths Per Min) 


Systolic BP 
(mm Hg) 


Diastolic BP 
(mm Hg) 


02 Sat. 

(%) 


Temperature 

CO 


Tympanic 
Temperature (°C) 


Rectal 
Temperature (°C) 


Average 


89.11 


20.08 


129.1 


72.94 


98.07 


36.2 


36.27 


37.63 


Std. Dev. 


23.59 


4.96 


23.94 


13.89 


2.03 


0.77 


0.82 


1.11 


Min 


30 


6 


43 


20 


60 


30 


32 


30.6 


Max 


220 


60 


282 


179 


100 


40.6 


40.5 


40.6 


Patient Count 


30227 


30193 


26987 


26987 


29214 


20836 


19020 


1424 


Total Count 


85744 


84239 


78682 


78682 


81579 


42695 


29564 


1848 
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Table 2B. Statistics on Validated Vital Signs, Stratified by Patient Age 



Age Range (Years) 


Pulse 


Respiratory Rate 


Systolic BP 


Diastolic BP 


02 Saturation 


Temperature (°C) 


Patient Count 


0-10 


126.03 


27.78 


109.05 


62.63 


98.09 


36.64 


5927 


11-20 


89.47 


19.41 


119.86 


67.85 


98.79 


36.22 


3460 


21-30 


86.07 


18.76 


122.91 


72.34 


98.68 


36.20 


4655 


31-40 


84.79 


18.69 


126.06 


75.28 


98.50 


36.17 


3379 


41-50 


84.09 


18.86 


129.87 


76.82 


98.10 


36.15 


3717 


51-60 


83.36 


18.90 


131.49 


75.77 


97.86 


36.12 


3370 


61-70 


82.56 


19.13 


135.33 


73.93 


97.72 


36.13 


2458 


71-80 


81.53 


19.21 


136.39 


70.92 


97.60 


36.17 


1723 


81-90 


80.24 


19.26 


137.98 


70.62 


97.41 


36.15 


1304 


91-100 


79.01 


19.49 


141.70 


71.61 


97.30 


36.15 


331 


>100 


74.48 


17.61 


136.42 


70.29 


97.84 


35.81 


13 


21 + 


83.49 


18.96 


130.97 


73.94 


98.02 


36.15 


21835 


0-21 


115.36 


25.33 


113.83 


64.64 


98.29 


36.51 


8441 



When plotted by age, the vital signs confirm physiologic 
expectations; heart rates and oxygen saturation decline with 
age, while blood pressure rises (Table 2B). 

There was no daily variation in blood pressure and no 
seasonal variation in temperature (data not shown), as others 
have suggested in non-emergency cohorts [17, 18]. 

The time period most associated with low quality scores 
(370 total) was from 9- 10pm, which is ten hours after the 
time of peak intake time (when most patients receive a set of 
vital sign measurements) in our emergency department, but 
at the tail end of the peak occupancy rate [19] (which 
remains consistently high from 3pm until 10pm). The fewest 
low-quality vitals (106 total) were recorded between 5-6am, 
which is associated with both the lowest intake and lowest 
occupancy in the department. The rate of error per patient 
triaged at this early morning hour, however, was 
approximately equal to (in fact, slightly higher than) the 9- 
10pm error rate per patient triaged (0.190 vs 0.178), despite a 
much lower influx of patients and a much lower patient 
census. 

DISCUSSION 

In the process of importing free-text vital sign data from 
our emergency department information system, we now have 
a data warehouse with quality-checked emergency 
department vital signs. We can now analyze what may be the 
broadest set of human vital signs yet reported, encompassing 
neonates to centenarians, the mostly-well to the moribund. 
Prior to processing this data, applying transforms when 
necessary, and importing the data into the MSDW, analysis 
of our ED's vital sign characteristics, such as determining 
standard deviations or averages based on age, was not 
possible. 

The usefulness of this data is manifold. First, just as 
some [20, 21] have argued that clinical lab values are a 
valuable resource for studying outcomes and disease 
progression, so too can vital sign warehousing demonstrate 



significance for health prognostication, especially when 
correlated with outcomes data demographic information, 
diagnoses and comorbidities, laboratory and imaging results. 

Second, our transforms were very useful. In EMR, users 
have demonstrated preference for free-text fields [22] and 
we've shown that we can capture meaningful data from over 
98% of these fields. This system preserves the best of both 
worlds: preserving free-text entry fields preventing 
frustrating data entry experiences in the chaotic environment 
of the ED, and the transforms allow meaningful data to be 
graded or salvaged. 

Furthermore, the transformation rules we validated will 
have immediate application to the clinical data tasks 
informaticists and IT personnel find themselves facing today, 
in making data available for health information exchange, 
demonstration of meaningful use of electronic health 
records, or compliance with accountable care organization 
goals [15,16]. 

We suspect hospital QA efforts may be aided by the 
finding that quality scores per patient triaged are relatively 
low, and by inference, data entry error rates are highest, in 
the pre-dawn hours. At the very least, this finding may help 
guide staffing and break time decisions. 

Prior studies on the usefulness of vital signs in the ED 
have focused on accuracy and precision of measurements. 
Our database can serve as a standard by which future ED 
studies can compare averages, variation and ranges in human 
vital signs. 

The compilation of this data now makes possible 
determinations such as "the average ED patient temperature" or 
"the fraction of patients with abnormal vitals." Additionally, 
simple correlations that have long been suspected, such as pain 
and heart rate [23], may be easily researched. 
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APPENDIX A 

Logic for Vital Sign Transformations 

Pulse: a range between 30 and 250 (outside of which 
quality is set to 0). 

Transformation : 

remove all special characters other than 
' . ' , 

If value reported as 'a-b', take the highest, 
reduce quality score by 10 

If value is decimal, take the whole number 
before the decimal point, set data quality 
score to 0 

If the value has ' % ' take the whole number 
before the ' % ' , set data quality score to 0 

Respiratory Rate: a range between 6 and 60, outside of 
which quality is set to 0. 

Transformation : 

removal all special characters other than ' - 
i 

If value reported as 'a-b', take the highest 
value and reduce quality score by 10 

If value is decimal, set data quality score 
to 0 

Systolic & Diastolic Blood Pressure: A range from 30- 
305 for systole, 20-180 for diastole. 

Transformation : 

Take the value before '/' as ' SBP ' and the 
value after '/' as ' DBP ' 

Range of SBP is 30-305, range of DBP is 20- 
180. Values outside that range are given a 
quality score of 0 . 

if SBP or DBP is left blank or one has 
alphanumeric, then load both in SBP/DBP 
Comments, set quality score to 0. 

Remove ' - ' , ' % ' , ' . ' , and space in between 
the numbers. If these characters are found, 
reduce data quality score by 5 . 

If DBP > SBP or SBP = DBP, set the quality 
score of both to 0 

If SBP/DBP is 0/0, then set the quality score 
to 0 . 

set any 0 or blank to a quality of 0 , as with 
any value less than 20. 

In all cases where the recorded value (after 
stripping any leading and trailing non- 
numeric characters) is not in the form of 
xxx/yyy where xxx>yyy, we stored the full 
entered value in the comment field. There are 



no SBP or DBP facts stored, so there is no 
opportunity to assign a quality score 
to either SDP or DBP, and the quality score 
of the Comment field is set to 0 (giving a 
way to differentiate these comments from 
other comments for future reference). 

Oxygen Saturation: A range from 60 to 100 (with 
anything outside that range set to a quality score of 0). 
Transformation : 

Consider the numeric value before first '%' 
or space. If it's a text character, then load 
the comments with source as-is, set quality 
score to 0 . 

Clinical Result (02 Saturation) should be a 
whole number, else reduce quality score by 5 

Remove '$','*', between Number and 

'%' and if resulting value is in range, 
reduce quality score by 10, else set quality 
score to 0 

For entries marked "UT0" ("unable to obtain") 
Set UT0[zero] to UTO, load into comments, set 
quality score to 0 

Temperature: The Celsius range runs from 29.5 to 41.5 
(outside of which quality score is set to 0). 

Transformation : 

Temperatures recorded in the range between 
85-105 were assumed as Fahrenheit, converted 
to Celsius by subtracting 32 then multiplying 
by 5/9ths. 

For temperatures that come suffixed with 
alpha characters (such as F, C, T, R, 0). We 
stripped off the alpha part and storing the 
numeric part as a numeric fact and the alpha 
part as a Comment (text) fact. 

For temperatures containing punctuation other 
than a decimal point between digits (such as 
a comma): in these cases we changed the 
punctuation to a decimal point. 

Any ' , ' or ' ; ' in values are converted to 
' . ' , reduce quality score by 5 

If the value is above the range, assume a 
missing period and divide it by 10 to get the 
actual value. If the value is If the actual 
value derived is still above range set 
quality score to 0, else reduce quality score 
by 10. 

When value in oF is converted to oC 

If value has 'C' suffix, consider as ' oC ' 

If value like 'a-b', take the highest value 
and reduce quality score by 10. 
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